N-gram-based versus phrase-based statistical machine translation
نویسندگان
چکیده
This work summarizes a comparison between two approaches to Statistical Machine Translation (SMT), namely Ngram-based and Phrase-based SMT. In both approaches, the translation process is based on bilingual units related by word-to-word alignments (pairs of source and target words), while the main differences are based on the extraction process of these units and the statistical modeling of the translation context. The study has been carried out on two different translation tasks (in terms of translation difficulty and amount of available training data), and allowing for distortion (reordering) in the decoding process. Thus it extends a previous work were both approaches were compared under monotone conditions. We finally report comparative results in terms of translation accuracy, computation time and memory size. Results show how the ngram-based approach outperforms the phrase-based approach by achieving similar accuracy scores in less computational time and with less memory needs.
منابع مشابه
Shallow-Syntax Phrase-Based Translation: Joint versus Factored String-to-Chunk Models
This work extends phrase-based statistical MT (SMT) with shallow syntax dependencies. Two string-to-chunks translation models are proposed: a factored model, which augments phrase-based SMT with layered dependencies, and a joint model, that extends the phrase translation table with microtags, i.e. perword projections of chunk labels. Both rely on n-gram models of target sequences with different...
متن کاملN-Gram-Based Statistical Machine Translation versus Syntax Augmented Machine Translation: Comparison and System Combination
In this paper we compare and contrast two approaches to Machine Translation (MT): the CMU-UKA Syntax Augmented Machine Translation system (SAMT) and UPC-TALP N-gram-based Statistical Machine Translation (SMT). SAMT is a hierarchical syntax-driven translation system underlain by a phrase-based model and a target part parse tree. In N-gram-based SMT, the translation process is based on bilingual ...
متن کاملConnecting Phrase based Statistical Machine Translation Adaptation
Although more additional corpora are now available for Statistical Machine Translation (SMT), only the ones which belong to the same or similar domains with the original corpus can indeed enhance SMT performance directly. Most of the existing adaptation methods focus on sentence selection. In comparison, phrase is a smaller and more fine grained unit for data selection, therefore we propose a s...
متن کاملAn Investigation of the Sampling-Based Alignment Method and Its Contributions
By investigating the distribution of phrase pairs in phrase translation tables, the work in this paper describes an approach to increase the number of n-gram alignments in phrase translation tables output by a sampling-based alignment method. This approach consists in enforcing the alignment of n-grams in distinct translation subtables so as to increase the number of n-grams. Standard normal di...
متن کاملAnalysis and System Combination of Phrase- and N-Gram-Based Statistical Machine Translation Systems
In the framework of the Tc-Star project, we analyze and propose a combination of two Statistical Machine Translation systems: a phrase-based and an N -gram-based one. The exhaustive analysis includes a comparison of the translation models in terms of efficiency (number of translation units used in the search and computational time) and an examination of the errors in each system’s output. Addit...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005